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1  Summary 

Level  Set  Systems  and  the  Rice  CAAM  group  have  continued  to  test  the  state  of  the  art  target  detec¬ 
tion/template  matching  method  based  on  LI  minimization.  Given  the  spectral  signature  of  a  material,  we 
are  able  to  identify  the  pixels  in  a  hyperspectral  image(HSI)  that  contains  the  material.  A  new  data  set  (from 
RIT)  was  used  to  test  the  accuracy  of  the  algorithm  with  very  good  results.  The  speed  of  our  unmixing 
algorithm  has  been  improved  and  a  comparison  with  the  standard  nonnegative  least  squares  is  given.  We 
have  also  expanded  the  use  of  the  Bayesian  dictionary  learning  and  sparse  reconstruction  method  by  utiliz¬ 
ing  spatial  inter-relationships  between  different  components  in  images  and  trying  to  incorporate  sparsity  of 
spectral  vectors  in  terms  of  sparse  representation  by  endmembers  into  reconstruction. 


2  Improved  Unmixing 

A  hyperspectral  image  spect  ral  can  be  composed  of  a  relat  ively  small  number  of  materials.  Different  materials 
have  different  hyperspectral  signatures,  referred  to  as  endmembers.  Due  to  the  usual  low  resolution  of  HS1, 
the  pixels  often  contain  a  mixture  of  materials.  However,  it  is  often  the  case  that  only  a  few  materials  are  in 
any  given  pixel.  Umnixing  is  the  process  of  determining  which  materials /endmembers  are  in  a  given  pixel. 
A  hyperspectral  image  is  represented  by  a  three-dimensional  data  cube  M .  There  is  an  image  for  every  band 
in  the  hyperspectral  cube.  Each  image  has  m  rows  and  n  columns.  There  are  b  spectral  bands  in  the  data 
cube.  Formally, 

M  6  RmXflxb,  M  >  0. 

Let  /  6  S1  be  one  of  the  pixels  of  the  data  cube  M,  and  E  =  [ei  e 2  •••  e/f],  where  e,  are  the 
endmembers(either  given  or  computed).  We  will  assume  that  a  pixel  is  composed  of  a  linear  combination  of 
the  endmembers,  i.e., 

K 

f  =  y>,e,-  =  Ex. 

i-\ 

If  we  assume  that  only  a  few  endmembers  are  contained  in  a  pixel,  then  we  want  the  coefficient  vector  x  to 
be  sparse.  Additionally,  we  require  x  >  0,  since  there  can  only  be  a  positive  amount  of  a  material  in  a  pixel. 
Following  [1],  the  original  t\ -minimization  model  w'e  used  was 

min  \x\i  4-  ^\\Ex  -  f\\\,  s.t.  x  >  0. 

x  l 

Using  the  Split  Bregman  method  [2]  gives  the  following  algorithm: 


1  Initialization  rr°  =  6°  =  d°  =  0,  /°  =  /.  We  define  7  =  1/A  ; 

2  for  k  <-  1  to  nOuter  do 

3  for  r  4-  1  to  nlnner  do 

4  6n+1  =6n  +  an-dn; 

5  in+1  =  (£t£  +  7/)-1  (ETfk-bn+'  +dn); 

6  dn+1  =  max  (an+1  4-  6n+1  -  ji,  0); 

7  end 

8  /*+ 1  =  /*  4-  /  —  Ex™*1  ; 

9  end 


Numerically,  the  above  algorithm  gives  essentially  the  same  solution  as  the  nonnegative  least  squares 
problem  minx  \\Ax  —  f  ||2  s.t.  x  >  0.  However,  the  nonnegative  least  squares  implementation  in  Matlab  was 
significantly  faster  than  the  above  algorithm.  In  order  to  speed  up  the  computation,  we  consider  a  different 
implementation  of  the  following  model  [3]: 

min  rf\\x\\\  4-  \\Ex  -  /||2,  s.t.  x  >  0, 

X 

and  rf  is  a  positive  constant.  The  above  is  equivalent  to  the  the  following  problem 

min*,*  »?£x  +  ||Ex-/||2; 
x  =  P(d), 


where  P  is  a  component  wise  operator  defined  as 


a  >  0 
otherwise, 


The  constrained  problem  (1)  is  replaced  by  the  following  sequence  of  unconstrained  problems: 

d*+I,xfc+l  =  minx,d  XtjYI  x  +  A|| Ex  -  f\\2  4-  \  \x  -  P(d)  -  6*||2, 
bk+l  =  bk  +  P(dk+l)  -xk+l, 

where  A  is  a  user  defined  parameter. 

The  first  subproblem  can  also  be  split  into  two,  giving 

d/c+1  =  mind  Ar/^2’4-  A||E:e*  -  /||2  4-  \\xk  -  P(d)  -6*||2, 
xk+1  =  minx  \tj^Tx  +  A|| Ex  -  f  ||2  4-  \\xk  —  P(dk+')  -  6*||2. 
bk+ 1  =  bk  4-  P(dk+l)  -  xk+l, 


Each  subproblem  can  now  be  solved  exactly  using  the  iterations: 
dk+l  =  P(xk  -bk), 

xk+'  =  (A EtE  4-  /)-1(P(dfc+l )  4-  bk  4-  A ET f  -  A t/7), 
bk+l  =bk  4-  P(dk+l)  -  xk+l. 


(2) 


(3) 


(4) 


(5) 


2.1  Experimental  Results 

We  applied  the  improved  algorithm  to  a  dataset  of  Moffet  Field,  CA  obtained  from  NASA  [4].  It  has  spatial 
dimensions  of  500x614  and  224  spectral  bands.  Six  endmembers  were  chosen  with  parameters  A  =  |^^|| 
and  77  =  1.  Results  are  given  in  Figures  1-6.  The  improved  algorithm  also  has  a  significant  edge  over  the 
nonnegative  least  squares  method  of  Matlab  in  computational  time.  The  table  below  gives  comparisons  of 
the  computational  time  for  4  different  hvperspectral  data  sets  in  addition  to  the  Moffet  Field  data.  The 
other  data  sets  include  the  URBAN  set  from  [5],  and  3  data  sets  provided  to  us  bv  Major  Fay  Spellerberg 
of  the  USAF. 
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Data 

Spatial  Dim. 

Spec.  Bands 

#  of  endmembers 

Time  for  NNLS(s) 

Time  for  LI  unmixing(s) 

Moffet  Field 

500x614 

224 

6 

246 

42 

URBAN 

307x307 

163 

6 

76.4 

12.2 

AF  Oil 

573x256 

210 

4 

102.5 

19.1 

AF  012 

573x256 

210 

4 

98.7 

19.0 

AF  016 

573x256 

210 

4 

109.9 

19.0 

Figure  1 :  Unmixed  data 


3  Anomaly /Target  Detection 

We  have  further  tested  out  anomaly/ target  detection  algorithm  which  is  summarized  as  follows.  Given  a 
HSI  with  NxN  pixels  and  M  spectral  bands,  we  wish  to  locate  the  positions  of  pixels  that  correspond  to  a 
given  spectral  signature  /,  which  also  has  M  spectral  bands.  We  rearrange  A  as  an  M  x  Ar 2  matrix,  where 
generally  M  <  N2.  The  signals  are  the  columns  of  this  spectral  matrix  A  and  correspond  to  each  pixel 
in  the  image. 

Our  goal  is  to  find  u  €  RN  by  solving  the  constrained  minimization  problem 

u=argmin  |u|a  s.t.  \\Au  -  f\\  <  6, 

n  >  0.  W 

where  S  is  a  measure  of  the  noise  in  the  system. 

To  solve  this,  we  apply  Bregman  iteration  [6,  7],  by  solving  a  sequence  of  unconstrained  minimization 
problems. 


un+1  =  argmin 


(HuIi  +  ^ 


\\Au-n 


(7a) 
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Figure  2:  Unmixod  data 


/»  =  /"->+/-  Aun-'  (7b) 

for  n  =  1,2, ...,  with  u°  =  0.  The  constant  A  is  usually  chosen  around  A  =  •  \\An  —  f" ||2  monotonically 

decreases  to  zero  and  un  converges  very  quickly  to  a  solution  of  (6)  with  <5  =  0,  see  [7,  6], 

It  now  becomes  a  matter  of  solving  (7a)  and  (6)  efficiently.  We  propose  two  recently  developed  algorithms 
(1)  Split  Bregman[2]  for  (7a)  and/or  (6),  linearized  Bregman  for  (6)  (8,  9,  10,  7).  The  idea  behind  both  of 
these  is  quite  simple.  There  are  two  simple  minimization  problems  to  be  solved.  A  combination  of  these 
solvers  will  converge  to  the  desired  solution,  as  we  described  before.  To  solve 


argmiv  ^ i  |u|,  +  1  ||u  -  f\\‘ 
we  have  the  following  well  known  shrinkage  formula 


Moreover,  if  we  add  the  constraint  that  n,  >  0,  then 

iii  =  shrink+(fi,n)  = 


(8) 


To  solve 


for  a  fixed  vector  d  we  have 


f  h  -  l‘  if  h  >  /< 

0  if  |/i|  <  H 

[  f,+g  if  f,  <  -ii 

(9) 

1  ft  ~  H  if  h  >  /< 

1  0  if  ft  <  // 

(10) 

(11) 

)"'  (A ATf  +  d) 

(12) 
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Figure  3:  Unmixed  data 


The  idea  behind  split  Bregman  is  s  follows:  We  replace  the  problem  (7a)  by  a  sequence  of  approximations 
generated  by  Bregman  iteration: 

f  (dk+1,Uk+1)  =  argminuld^  +  |  ||/h/  -  f\\2  +  \  ||d  -  U  -  b*||2  fn1 

\  bk  =  bk~'  +Uk  -  dk~'1  '  ’ 

The  steps  used  in  the  solution  for  (13a)  and  (13b)  involve  splitting 

J  uk+1  =  (XAT  A  +  /)■'  (XAT /"  -  bk  +  dk) 

\  dk+i  =  shrink  (Uk+l  +  bk+i  ,fi)  1 

Uk  approaches  un+1  monotonically,  || lJk  —  \  0,  and  of  course,  ||d  —  Uk\\  \  0.  Thus  we  use  an  inner 

iteration  to  obtain  the  sequence  Uk ,d* ,  which  converges  to  the  updated  u.  We  then  update  using  (7)  to  get 
/n+1  and  repeat  the  inner  iteration  to  get  un+2 .  This  procedure  is  very  efficient.  The  number  of  of  inner 
iterations  needed  is  problem  dependent,  but  usually  between  5  and  10. 

Alternatively,  we  may  use  the  linearized  Bregman  approach  [8,  9,  10,  7]  to  solve  (6)  directly.  This  involves 
a  very  simple  2  line  code  and  has  the  advantage  that  we  need  not  compute  the  matrix  inverse  appearing  in 
(14a).  However,  it  is  a  bit  slower  for  bigger  matrices  A.  The  entire  algorithm  is  as  follows: 


J  nk+1  =  Sshrink  (yk,n) 

\  vk+]  =  vk  -f  A1  (/  —  Avk+}) 


for  v°  =  0,  k  =  0, 1,...  with  \AAJ  <  f,  6  >  0. 

For  fi  sufficiently  large,  this  converges  to  a  solution  of  (6),  see  [11].  However,  in  general  the  solution  u 
satisfies 

1 

+  25/1 

such  that  Au  =  /. 


u  =  argmvn  \u 


n 
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Figure  4:  Unmixed  data 


3.1  Experimental  Results 

3.1.1  RIT  Data 

We  obtained  some  HSI  data  from  Rochester  Institute  of  Technology  (RIT)  [12]  to  test  our  target  detection 
algorithm.  The  first  set  of  targets  consisted  of  6  pieces  of  cloth,  made  of  nylon  or  cotton,  and  of  different 
sizes  and  colors.  We  ran  the  target  detection  algorithm  for  each  of  the  6  cloth  targets,  and  used  the  ground 
truth  location  as  the  spectral  signature  for  each  target.  A  80x80x126  subset  of  the  whole  data  set.  which  had 
size  280x800x126,  was  used  for  the  computation.  Figure  9  shows  the  ground  truth  locations  of  the  6  pieces 
of  cloth  and  Figure  10  shows  the  computed  locations  of  the  targets.  The  locations  were  exactly  computed 
with  no  false  pixels  identified. 

A  second  set  of  targets  consisting  of  3  vehicles  was  also  used  to  test  the  algorithm.  Each  vehicle  was 
detected  in  a  separate  run  of  the  algorithm,  and  the  ground  truth  locations  of  the  targets  were  used  as  the 
spectral  signatures.  Similar  to  the  first  test,  we  used  a  subset  of  size  80x80x126  for  the  experiment,  and  the 
locations  of  the  vehicles  were  exactly  detected  with  no  falsely  identified  pixels.  Figure  7  show's  the  ground 
truth  locations  of  the  3  vehicles  and  Figure  8  shows  the  computed  locations  of  the  targets. 

3.1.2  Detecting  Targets  on  the  Whole  Data  Set 

Running  our  target  detection  algorithm  on  the  entire  280x800x126  HSI  would  have  been  prohibitive  due 
to  large  memory  requirements.  We  are  currently  testing  methods  to  overcome  this  limitation.  One  such 
method  is  to  break  up  the  data  cube  into  pieces  spatially,  and  run  the  algorithm  on  each  piece.  However, 
numerical  experiments  revealed  that  if  the  subset  of  the  data  cube  does  not  contain  the  target,  the  algorithm 
may  falsely  identify  some  pixels  as  targets.  The  reason  for  this  is  that  the  solution  to  (6)  tends  to  produce 
solutions  u  ^  0,  even  though  that  is  the  ideal  solution  if  the  target  /  is  not  present  in  the  data  cube.  One 
possible  solution  to  this  problem  is  to  slightly  alter  the  matrix  of  pixels  A,  by  adding  a  column  fs  at  the  end. 
fs  =  f  - h  6 ,  where  <5  is  some  noise.  Then  the  new  matrix  becomes  A  =  [A  :  fs]  and  has  size  M  x  (AT2  +  1). 
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Figure  5:  Unmixed  data 


Then  we  solve  the  altered  problem 


u  =  argmin  |tz|j  s.t .  j|y4t/  —  /|  <  <$, 

XL  >  0, 

If  A  does  not  contain  the  target  /,  the  solution  should  be  xl  =  (0, . . .  ,  0,a),  where  a  >  0.  We  then  take  the 
solution  to  the  original  problem  as  u*  =  iq,  i  =  l,...,Ar2.  If  A  does  contain  the  targets,  the  addition  of 
fs  to  the  end  of  A  does  not  effect  the  identification  of  the  real  targets.  We  tested  this  on  one  of  the  cloth 
targets,  result  shown  in  Figure  11,  and  on  one  of  the  vehicles,  shown  in  Figure  12.  As  seen  in  the  figures,  the 
targets  were  detected  exactly  with  no  falsely  identified  pixels.  Each  of  the  subsets  were  of  size  40x40x126, 
and  a  total  of  140  calculations  w’ere  required  to  process  the  entire  280x800x126  data  cube.  The  calculated 
threshold  value(described  in  the  section  below')  for  these  was  c  =  0,  so  falsely  identified  pixels  were  not  a 
problem  for  this  particular  data  set.  Noise  was  also  not  a  problem  for  this  data  set,  so  we  set  S  =  0. 

3.1.3  Thresholding 

Our  previous  experiments  have  shown  that  the  number  of  falsely  identified  pixels  can  be  reduced  by  thresh¬ 
olding  the  values  of  Ui  >  0.  Therefore,  wre  consider  pixel  i  as  a  falsely  identified  target  if  Ui  <  t  for  some 
threshold  value  t  >  0.  So  far,  we  have  determined  the  threshold  value  t  by  trial  and  error,  but  we  present 
a  method  for  computing  it  automatically.  Let  us  =  {u*  :  ux  >0}.  Next,  sort  the  values  of  u 9  in  descending 
order.  Our  numerical  experiments  have  shown  that  the  values  u,  of  the  falsely  detected  pixels  are  general  1\ 
much  smaller  that  the  values  of  the  correctly  identified  pixels.  This  suggest  that  the  threshold  should  be 
where  the  largest  jump  is  in  us .  As  an  example,  we  consider  the  real  HSI  data  provided  to  us  by  NGA, 
which  was  analyzed  in  our  second  report.  More  specifically,  we  look  at  the  plot  of  u8  for  the  gravel  target, 
shown  in  Figure  13.  It  is  clear  where  the  biggest  jump  in  us  is  and  suggests  that  the  threshold  value  in  this 
case  should  be  0.0079. 


7 


Figure  6:  Unmixed  data 


4  Bayesian  Dictionary  Learning  for  Sparse  linage  Representations 
and  Reconstruction 

4.1  Summary 

We  applied  the  recent  Bayesian  dictionary  learning  method  [13]  to  reconstruct  hyperspectral  images  from 
very  few  observations.  The  Bayesian  dictionary  learning  method  models  a  hyperspectral  image  as  the 
sparse  sum  of  a  set  of  dictionary  atoms,  which  are  assumed  to  follow  certain  distributions  and  learned 
by  statistical  inference  from  partial  observations  of  the  hyperspectral  image.  The  size  of  dictionary  can 
be  inferred  nonparametrically.  For  reconstructing  hyperspectral  images  from  very  few  samples,  no  prior 
knowledge  of  the  noise  variance  needs  to  be  assumed,  and  the  noise  variance  can  also  be  non-stationary. 
We  have  utilized  spatial  inter-relationships  between  different  components  in  images  and  tried  to  incorporate 
sparsity  of  spectral  vectors  in  terms  of  sparse  representation  by  endmembers  into  reconstruction.  Numerical 
experiments  using  real  hyperspectral  were  performed,  with  very  good  results. 

4.2  Sparse  Image  Representation 

Image  reconstruction  and  analysis  are  based  on  how  images  are  represented.  In  the  standard  representation, 
a  natural  signal  is  treated  as  an  array  of  pixels  in  space  or  time.  This  is  convenient  for  digitally  sampling, 
display  or  playback.  However,  it  is  inefficient  for  many  reconstruction  and  analysis  tasks. 

A  more  meaningful  representation  shall  describe  the  useful  characteristics  of  the  signal:  for  reconstruction 
from  noisy  measurements,  the  representation  should  efficiently  separate  signal  and  errors;  for  compression, 
the  representation  should  capture  a  large  part  of  the  signal  with  a  sparse  or  compressible  set  of  coefficients; 
for  analysis  such  as  decomposition  and  recognition,  the  representation  should  highlight  salient  features.  They 
seem  to  be  different  goals  but  they  all  look  for  a  sparse  representation  of  features. 

One  way  of  such  representation  involves  the  choice  of  a  dictionary,  which  is  the  set  of  elementary  signals  C 
or  atoms  C  used  to  decompose  the  signal.  Consider  a  signal  x  G  Rn  and  a  fixed  dictionary  D  =  [dl  d?  ■  •  dM ], 


8 


Band  20 


Figure  7:  Ground  truth  locations  of  the  3  vehicles. 


where  each  dm  €  Rn.  We  wish  to  have  such  a  D  that  x  may  be  well  approximated  by  x  =  Da.  In  the  simplest 
case  the  dictionary  is  orthogonal.  Examples  include  the  discrete  cosine  basis  and  various  wavelets  based 
bases.  They  have  been  thoroughly  studied  and  widely  considered  in  applications  because  they  are  easy  to 
analyze  and  they  have  fast  numerical  implementations.  However,  they  are  over-simplistic  for  certain  real  data 
including  hyperspectral  imagery.  To  find  sparser,  meaningful  representations  for  more  signals,  researchers 
have  recently  developed  noil-orthogonal  dictionaries,  some  of  which  are  overcoinplete  (i.e.,  M  >  n)  and/or 
trained,  as  opposed  to  analytic. 

The  earliest  major  work  in  dictionary  training  is  due  to  Olshausen  and  Field  [14],  who  trained  an  over¬ 
complete  dictionary  for  sparsely  representing  small  image  patches  of  a  set  of  natural  images.  Remarkable 
results  were  obtained  from  a  simple  algorithm,  namely,  the  atoms  in  the  trained  image  were  very  similar 
to  the  simple  cell  receptive  fields  in  early  vision.  What  is  remarkable  in  this  finding  is  that  sparsity  plays 
a  key  role  in  biological  visual  behavior.  It  suggests  the  potential  of  sparse  representation  in  uncovering 
fundamental  features  in  complex  signals. 

Inspired  by  Olshausen  and  Field’s  and  others  work  [14,  15,  16,  17],  especially  the  recent  work  by  Lawrence 
Carin’s  group  [18,  13],  we  wish  to  impose  that  the  coefficients  nt-  in  the  representation  x  =  Da  are  sparse. 
With  a  proper  D,  the  computation  of  sparse  a  is  robust  to  noise  and  numerically  tractable  even  when 
x  is  partially  or  indirectly  observed  via  a  small  number  of  measurements  (as  arising  in  problems  such  as 
inpainting,  interpolation  and  compressive  sensing).  The  recent  work  [13]  is  different  from  all  previous  work 
in  the  following  aspects.  First,  when  D  is  given,  previous  work  computes  a  as  a  point  estimate  (i.e.,  a 
vector)  but  [13]  returns  a  posterior  distribution  through  statistical  inference.  Second,  to  return  the  point 
estimate  of  a,  previous  work  uses  algorithms  such  as  basis  pursuit  and  orthogonal  matching  pursuit,  for 
which  the  parameters  and  stopping  criteria  are  defined  based  on  assumed  knowledge  such  as  noise  variance 
or  the  sparsity  of  true  a.  However,  the  noise  variance  can  be  inferred  in  [13].  Last,  to  learn  the  dictionary 
D ,  its  size  M  is  fixed  but  the  statistical  inference  method  does  not  require  this  assumption. 
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Located  Targets 


Figure  8:  Found  locations  of  the  3  vehicles. 


4.3  The  Dictionary  Model 

Specifically,  the  method  we  use  is  based  on  the  Beta  process  [18]  applied  to  the  model  xl  =  Da 1  4-  e*  [13], 
where  each  xl  G  Rn  is  a  hvperspectral  image,  D  =  [dl  d2  •  •  dM]  £  RnxA/  is  a  dictionary,  a1  €  RA/  is  a 
sparse  vector,  and  (l  €  Mn  is  noise.  A  hierarchical  model,  which  is  a  dependence  graph  of  random  variables, 
is  assumed: 


dm 

rsj 

J\f(0,  P“l/p), 

(18) 

a 1 

z*Qs\ 

(19) 

z* 

rsj 

n^=1  Bernoulli  (7rm) 

(20) 

rsj 

Rpta(ao/A',6o(A'  —  1)/A') 

(21) 

S* 

r>*j 

-V(0,7s-1^k) 

(22) 

rsj 

(23) 

75 

rsj 

T(co.do) 

(24) 

7e 

rsj 

r(c0,/o). 

(25) 

Here  each  atom  dm  follows  a  Gaussian  distribution.  Each  vector  a*  is  the  Hadainard  (component-wise) 
product  of  a  0/1  Bernoulli  vector  zl  and  Gaussian  vector  sl.  zl  defines  which  the  atoms  of  the  dictionary  are 
used  to  represent  image  x\  and  wl  contains  the  representation  weights.  By  construction,  a1  is  sparse  since 
zl  is  generated  from  the  Beta  process,  which  is  reviewed  in  the  next  paragraph.  This  is  different  from  the 
common  Laplace  prior  [19],  which  leads  to  many  small  but  often  nonzero  coefficients.  Weight  s*  and  noise 
e 1  follow  normal  distributions  parametrized  by  75  and  7e,  respectively.  Since  there  is  full  conjugacv  in  the 
hierarchical  model,  inference  from  given  observations  of  xl  can  be  quickly  computed  by  variational  Bayesian 
[20]  or  Gibbs-sampiing  analysis,  with  analytic  update  equations. 

The  two-parameter  beta  process  (BP)  was  introduced  in  [18].  Although  it  is  allowed  for  K  — >  oo.  we 
assume  a  finite  K  for  simplicity  in  this  review.  Let  [0, 1]  be  evenly  divided  into  K  bins.  The  kth  bin  is 
denoted  by  the  interval  Bk  =  j±}.  For  each  k,  we  sample  7 r*  ~  Beta(^,  -^~^),  where  such  a  beta 
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Figure  9:  Ground  truth  locations  of  the  6  pieces  of  cloth. 


distribution  takes  values  over  [0, 1]  and  has  a  [/-shaped  probability  density  function.  Using  and  /?*.,  we 
generate  a  new  process  H(B)  :=  Ylk= l  nk$Bk(B),  ^here  equals  one  if  B  —  Bk  and  zero  otherwise. 

In  other  words,  nk  =  //(//*),  k  —  1,.. . ,  A',  are  a  series  of  numbers  between  0  and  1.  The  left  plot  below 
shows  an  example  from  [18]  where  K  =  1000.  Except  for  a  few  nonzeros,  most  locations  of  n  are  zero. 
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Then,  tjVs  are  used  as  parameters  to  sample  a  series  of  0/1  Bernoulli  numbers  Zk  ~  Bernoulli^*),  i.e.,  Zk  =  1 
with  probability  n k  and  Zk  =  0  with  probability  1  —  7T*.  Four  independent  samples  of  z  are  shown  in  the 
right  plot  above,  which  were  drawn  with  the  same  n  shown  in  the  left  plot.  Obviously,  they  are  sparse,  and 
there  are  repeats  of  l’s  at  the  locations  where  7r*’s  are  relatively  large.  By  marginalizing  7r,  it  can  be  shown 
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Located  Targets 


Figure  10:  Found  locations  of  the  6  pieces  of  cloth. 


that  the  total  number  of  nonzeros  ||2||o  of  z  follows  a  Poisson  distribution  with  parameter  a/b.  Furthermore, 
for  any  set  of  M  vectors  {zl , . . .  ,  2A/},  the  number  of  unique  locations  of  l’s  follow  the  Poisson  distribution 
with  the  parameter  YliL\  b+t-i'  ^ave  desired  jointly  sparse  rP’s.  one  shall  adjust  a  and  6. 

4.4  Hyperspectral  Image  Reconstruction  and  Denoising 

We  apply  the  model  xl  =  Da1  4-  e1  to  a  hyperspectral  image,  where  xns  and  the  atoms  in  I)  are  small  3D 
blocks  (corresponding  to  patches  for  21)  images).  Take  a  150  x  150  x  210  hypercube  for  example,  where  210 
is  the  number  of  bands.  A  block,  for  instance,  can  be  3  x  3  x  210.  The  set  of  atoms  contains  multiple  such 
blocks,  each  forming  a  vector  dm.  The  entire  hyperspectral  cube  is  decomposed  to  overlapping  3  x  3  x  210 
blocks,  each  being  ax*.  As  such,  all  blocks  of  the  hyperspectral  image  are  modeled  as  linear  combinations 
of  the  same  set  of  atoms,  and  their  combination  coefficients  are  jointly  sparse  to  a  large  extent  .  When  there 
are  missing  voxels,  the  corresponding  rows  of  xl  =  Da1  +  f  are  also  missing.  In  case  the  observations  are 
noisy,  the  hyper-prior  7t  of  e  is  drawn  from  a  lion-informative  gamma  distribution. 

To  recover  the  hyperspectral  image,  Gibbs  inference  is  applied  to  all  blocks  xl  =  Da1  +  el  from  given 
incomplete  observations.  Note  that  the  atoms  in  /),  all  a1,  as  well  as  noise  el  are  simultaneous  inferred. 
The  computation  is  relatively  fast  using  the  analytic  inference  update  equations,  which  exist  due  to  full 
conjugacv. 

Numerical  Results.  We  tested  the  above  algorithm  on  a  subset  of  the  URBAN  data.  A  reconstruction 
using  5%  of  the  150x150  spatial  dimensions  and  163  spectral  bands  was  performed.  Figures  14  and  15  show 
the  results  of  the  reconstruction.  To  check  the  quality  of  the  reconstruction,  we  ran  our  target  detection 
algorithm  on  the  reconstructed  hypercube.  A  comparison  is  shown  in  Figure  16. 

4.5  Integrated  Endmember  Recovery  and  Hyperspectral  Image  Reconstruction 

This  is  a  project  that  we  have  partially  done  and  will  continue  in  the  next  phase  of  this  contract.  The  above 
model  x1  =  Da1  4-  c*,  which  uses  image  patches  as  dictionary  atoms,  does  not  take  advantage  of  the  fact 


12 


Ground  truth  for  cloth 


Figure  1 1 :  The  top  plot  shows  the  whole  data  set  and  the  ground  truth  location  of  the  cloth(red),  and  the  bottom  shows  the 
result  of  the  detection  of  the  cloth 


that  all  hyperspectral  vectors  are  generated  from  a  small  number  of  endmembers.  This  is  another  nature  of 
sparsity.  We  have  been  looking  for  ways  to  exploit  this  property  for  better  performance. 

The  new  model  is  =  Eft  +  z?  for  each  spatial  pixel  j  and  ft  —  Do*  +  p  for  each  patch  i  of  0.  In 
the  first  set  of  equations,  E  is  a  dictionary  of  endmembers  that  form  all  the  hyperspectral  vectors.  is  the 
hyperspectral  vector  at  pixel  j,  which  is  assumed  to  be  a  linear  combination  of  the  endmembers,  and  e*  is 
the  error.  In  reality,  the  combination  is  nonlinear  due  to  absorption,  etc.  This  will  be  considered  in  i?  as  a 
part  of  the  future  work.  The  set  of  coefficients  {ft}  form  another  3D  cube,  which  has  the  same  number  of 
spatial  pixels  as  the  original  hyperspectral  cube  but  a  much  smaller  number  of  layers  (the  3rd  dimension) 
equal  to  the  number  of  endmembers.  In  other  words,  the  first  set  of  equations  relate  the  larger  3D  cube  x 
to  the  smaller  3D  cube  0  through  endmembers.  Then,  in  the  second  set  of  equations  above,  the  dictionary 
model  is  applied  to  0.  Each  0X  is  not  a  vector  but  a  cube-like  patch.  It  is  assumed  to  be  a  sparse  linear 
combination  of  a  set  of  atoms  given  in  the  dictionary  D .  The  dictionary  /),  coefficients  a*,  and  errors  P  are 
to  be  inferred  from  a  given  observation  in  x.  In  other  words,  what  is  described  in  the  previous  subsections 
is  applied  to  0. 

Preliminary  Numerical  Results.  Eventually,  we  plan  to  learn  the  endmembers  E  from  the  data 
at  the  same  time  D ,  a,  and  e  are  learned.  Modeling  nonlinearity  is  also  in  our  agenda.  Our  preliminary 
results  are  based  on  approximate  endmembers,  which  are  obtained  using  the  algorithm  VGA  [21].  So  at  this 
moment,  E  is  learned  offline  and  not  state-of-the-art. 

In  the  test,  the  original  cube  is  150  x  150  x  163.  VCA  gave  6  endmembers,  so  0  is  150  x  150  x  6  Two 
sets  of  images  are  shown  in  Figure  17  (a)  and  (b),  corresponding  to  spectral  bands  1  and  6,  respectively. 
The  first  columns  show  the  sample  points  on  bands  1  and  6,  which  are  smeared  by  noise  (SNR  7.0685dB). 
The  recovered  cube  has  an  SNR  14.8368dB.  In  other  words,  not  only  the  cube  is  recovered  from  very  few' 
samples,  but  the  SNRs  are  doubled.  At  this  time,  the  work  uses  approximate  endmembers.  We  plan  to 
integrate  endmember  recovery  and  target  detection  seamlessly  into  this  computation. 
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Ground  Truth  for  vehicle 


Figure  12:  The  top  plot  shows  the  whole  data  set  and  the  ground  truth  location  of  the  vehicle(red),  and  the  bottom  shows 
the  result  of  the  detection  of  the  vehicle. 
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Figure  15:  A  plot  of  band  150  of  the  corrupted  data,  reconstructed  data,  and  original  data. 
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Corrupted  image,  7  0685dB  Restored  image,  14  8367dB 


(b)  Band  6 


Original  image 


Figure  17:  Left  to  right:  Very  noise  samples,  recovered,  and  original.  Note  that  the  sample  points  (left 
column)  themselves  have  a  low  SNR  7.0685dB.  The  recovered  cube  (middle  column),  including  all  voxels, 
has  more  than  doubled  SNRs. 
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